{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Learning sentence representation with Sentence-BERT  \n",
    "\n",
    "Sentence-BERT (SBERT) is introduced by the Ubiquitous Knowledge Processing Lab (UKP-TUDA). As the name suggests, Sentence-BERT is used for obtaining fixed-length sentence representation. Sentence-BERT extends the pre-trained BERT model (or its variants) to obtain the sentence representation. Wait! Why do we need the Sentence-BERT for obtaining sentence representation? We can directly use the vanilla BERT or its variants to obtain the sentence representation, right? Yes! \n",
    "\n",
    "But one of the challenges with the vanilla BERT model is that its high inference time. Say, we have a dataset with  number of sentences then to find a sentence pair with high similarity, it takes about $n(n-1)/2$ computations.\n",
    "\n",
    "To combat this high inference time, we use Sentence-BERT. The Sentence-BERT drastically reduces the inference time of BERT. Sentence-BERT is popularly used in tasks like sentence pair classification, computing similarity between two sentences, and so on. Before understanding how Sentence-BERT works in detail, first, let's take a look at computing sentence representation using the pre-trained BERT model in the next section. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Computing the sentence representation \n",
    "\n",
    "Consider a sentence – Pairs is a beautiful city. Suppose, we need to compute the representation of the given sentence. First, we will tokenize the sentence and add [CLS] token at the beginning and [SEP] token at the end, so our tokens become the following:\n",
    "\n",
    "tokens = [ [CLS], Paris, is, a, beautiful, city, [SEP] ]\n",
    "\n",
    "Now, we feed the tokens to the pre-trained BERT and it will return the representation $R_i$ for each of the token $i$  as shown in the following image:\n",
    "\n",
    "![title](images/1.png)\n",
    "Okay, we have obtained the representation $R_i$ for each of the tokens. How can we obtain the representation of the complete sentence? We learned that the representation of the [CLS] token $R_{[CLS]}$ holds the aggregate representation of the sentence. So, we can use the representation of [CLS] token $R_{[CLS]}$  as the sentence representation: \n",
    "\n",
    "$$\\text{Sentence representation} = R_{\\text[CLS]} $$\n",
    "\n",
    "But the problem with using the representation of [CLS] token as a sentence representation is that the sentence representation will not be accurate especially if we are using the pre-trained BERT model directly without fine-tuning it. So instead of using the representation of [CLS] token as a sentence representation, we can try averaging. That is, we take the average representation of all the tokens and use that as a sentence representation: \n",
    "\n",
    "\n",
    "$$\\text{Sentence representation} = \\frac{\\sum_{i=0}^n {R_{i}}}{n} $$\n",
    "\n",
    "So, the sentence representation becomes: \n",
    "$$\\text{Sentence representation} = \\frac{ R_{\\text{[CLS]}}+R_{\\text{Paris}}+R_{\\text{is}}+R_{\\text{a}}+R_{\\text{beautiful}}+R_{\\text{city}}+R_{\\text{[SEP]}}}{7} $$\n",
    "\n",
    "\n",
    "But instead of averaging, the better approach to compute representation is by pooling. That is, we compute sentence representation by pooling the representation of all the tokens. The mean pooling and max pooling are the two most popularly used pooling strategies. Okay, how mean and max pooling would be useful here? \n",
    "\n",
    "- If we obtain sentence representation by applying mean pooling to the representation of all the tokens, then essentially the sentence representation holds the meaning of all the words (tokens). \n",
    "- If we obtain sentence representation by applying max pooling to the representation of all the tokens, then essentially the sentence representation holds the meaning of important words (tokens). \n",
    "\n",
    "Thus, we can compute the representation of a sentence by pooling the representation of all the tokens as shown in the following image:\n",
    "\n",
    "\n",
    "![title](images/2.png)\n",
    "\n",
    "\n",
    "Now that we learned how to compute sentence representation using a pre-trained BERT model. Let us see how Sentence-BERT works in detail in the next section.  \n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
